The continuum of Drosophila embryonic development at single-cell resolution

INTRODUCTION: Single-cell technologies are a powerful means of studying metazoan development, enabling comprehensive surveys of cellular diversity at profiled time points and shedding light on the dynamics of regulatory element activity and gene expression changes during the in vivo emergence of each cell type. However, nearly all such whole-embryo atlases of embryogenesis remain limited by sampling density—i.e., the number of discrete time points at which individual embryos are harvested and cells or nuclei are collected. Given the rapidity with which molecular and cellular programs unfold, this limits the resolution at which regulatory transitions can be characterized. For example, in the mouse, there are typically 6 to 24 hours between sampled embryonic time points—gaps within which massive molecular and morphological changes take place. RATIONALE: To construct an ungapped representation of embryogenesis in vivo, we would ideally sample embryos continuously. Although this is not practical for most model organisms, it is potentially possible in Drosophila melanogaster, where collections of timed and yet somewhat asynchronous embryos are easy to obtain, such that, at least in principle, one can achieve arbitrarily high temporal resolution. Drosophila could therefore serve as a test case to develop a framework for the inference of continuous regulatory and cellular trajectories of in vivo embryogenesis. Because Drosophila is a preeminent model organism that has yielded many advances in the biological and biomedical sciences, obtaining a single-cell atlas of Drosophila embryogenesis is also an important goal in itself. This includes its embryonic development, where the use of this model in conjunction with powerful genetic tools has transformed our understanding of the mechanisms by which developmental complexity is achieved, in addition to uncovering many general principles of both genetic and epigenetic gene regulation. RESULTS: We profiled chromatin accessibility in almost 1 million nuclei and gene expression in half a million nuclei from eleven overlapping windows spanning the entirety of embryogenesis (0 to 20 hours). To exploit the developmental asynchronicity of embryos from each collection window, we applied deep neural network-based predictive modeling to more-precisely predict the developmental age of each nucleus within the dataset, resulting in continuous, multimodal views of molecular and cellular transitions in absolute time. With these data, the dynamics of enhancer usage and gene expression can be explored within and across lineages at the scale of minutes, including for precise transitions like zygotic genome activation. CONCLUSION: This Drosophila embryonic atlas broadly informs the orchestration of cellular states during the most dynamic stages in the life cycle of metazoan organisms. The inclusion of predicted nuclear ages will facilitate the exploration of the precise time points at which genes become active in distinct tissues as well as how chromatin is remodeled across time.


sci-RNA-seq3 library construction and sequencing
We performed sci-RNA-seq3 as previously described (1). Several experiments were done and each one included samples from multiple time windows. Associated time-windows for nuclei batches were tracked through specific wells. Additionally, for a subset of the data we included mouse nuclei to serve as a control to determine the rate of cell doublets.
Read processing, nuclei filtering, and doublet removal The read alignment and gene count matrix generation was performed as previously described (1,55). With the single-cell gene count matrix cells with fewer than 250 UMIs, more than 10,000 UMIs, reads mapping to more than 7000 genes, or more than 10% of read counts mapping to ribosomal genes were excluded. Each nuclei was mapped to its original time window from which it was extracted by using the RT barcode.
We performed standard processing of the data split by experiments as recommended by Seurat v3 (56) documentation including NormalizeData, FindVariableFeatures (with method set to 'vst'), ScaleData, RunPCA, RunUMAP (with dims set to 1:50 and n.components to 2), FindNeighbors (with reduction set to UMAP and dims set to 1:2), FindClusters. For detection and removal of doublet nuclei we relied on an iterative strategy described in (1). We used a modified version of DoubletFinder (57) that could handle large data sets to document for each cell the proportion of neighbors that were simulated doublets. This version uses the first 30 principal components, along with pN of 0.2 and pK of 0.005. The number of nuclei classified as doublets was based on estimates of the count of doublets (barnyard estimate multiplied by recovered nuclei) and were chosen to be the top rank nuclei with the greatest proportion of nuclei that were simulated doublets. Unfortunately, this process alone did not remove all putative doublets that may have been lost among different clusters of cells. Therefore, following clustering of the global dataset, we individually processed each cluster identified and sub-clustered the data. We then eliminated subclusters that were at least 15% classified as doublets. Following removal of all the initial nuclei identified as doublets and subclusters with a large proportion of doublets, we reprocessed the global dataset.
Dimensionality reduction, clustering, and identifying cluster-specific marker genes The standard seurat processing pipeline, as described in the previous section, was performed on each non-overlapping inferred age window separately. We found that the default Seurat clustering resolution parameter did not capture the dynamics of the presence of different cell types across development. So for each time window we clustered the data with a variety of resolutions (from 0.1 to 1.5 in increments of 0.3) and then computed the within cluster sum of squares (WSS). Visualizing WSS across increasing resolutions we chose the resolution in which there was a visible plateau in the decrease of WSS. The selected set of resolutions are listed in Table S11. Finally we clustered the data for each inferred age time window using the per-window choice of resolution parameters. From these clusters we used Seurat's FindMarkers to iteratively loop through all clusters and identify marker genes (with only.pos=T) which were used for cluster annotation. For the set of marker genes used for lineage annotation we excluded genes with 'log.FC < 0.25' or 'min.pct < 25'.

Annotating cell types and tissues
For cluster annotation, we used the Berkeley Drosophila Genome Project (BDGP) database, which includes gene expression patterns of approximately 8600 genes in drosophila staged embryos as detected by in situ hybridization (20,21,58). The BDGP database gives a stagespecific expression pattern ("term") for each tested gene during embryogenesis. We used Fisher's test to look for enrichment of BDGP gene expression terms in each cluster's marker genes. Top ten terms per cluster were examined. To pick a specific term out of the top ten, we further examined the BDGP terms of the top 20 marker genes for each cluster.

Clustering genes
Inferred-time associated co-regulated genes specific to certain germ layers were determined with an unsupervised clustering approach. First, we subset the full seurat object that included cells from all time windows to only those cells annotated as a specific germ layer. For example, in analyses shown in Fig. 4, we focused on mesoderm. As described previously in the standard seurat processing pipeline, we scaled gene expression values and subsetted to the 5000 most highly variable genes. We constructed 100 bins of roughly equal numbers of cells across inferred time, and then smoothed expression values by computing the average expression of each gene in each window, after trimming 10% of the outlier genes, along with all expression values for cells in time proximal bins from a sliding window across inferred time. Once again we subsetted the genes to the top 2,000 that were highly variable across these smoothed time windows and then these variable expressed genes were scaled and centered to have mean=0 and sd=1. Finally, we performed dynamic time warp clustering with the 'tsclust' function from dtwclust v5.5.10 with type='partitional', distance='dtw_basic', and centroid='pam' on these scaled gene values to identify co-regulated genes (59). To choose the number of clusters (k) we performed clustering with all k values from 2 through 35 and then selected the earliest k at which several metrics of clustering metrics began to plateau (Fig. S8). We downloaded the Kah ChIP-seq data from the ENCODE portal (60) with the following identifier: ENCSR161YRO.

Gene pathway enrichment analysis
We were interested in relating various gene sets to known biological processes. For this task we used FlyEnrichr to perform gene set enrichment analyses (61,62). Gene sets were uploaded to the FlyEnrichr server with the enrichR version 3.0 package for R available on CRAN. We restricted our results to enrichments in the "GO Biological Process 2018" database and the "RNAi Screens from GenomeRNAi 2017" database.

RNA velocity analysis
We randomly sampled 20,000 cells from 3 adjacent time windows (10-12 hr, 12-14 hr, and 14-16 hr) and then randomly grouped cells into 100 meta cell bins and aggregated all reads per meta cell bin. We then used the scVelo software package version 0.2.2 (63) and the standard processing pipeline to estimate the RNA velocity graph, which we then visualized as a directional vector in the first two principal components space, thus preserving the linear interpretability. The script for performing this analysis is included in our data sharing page.

sci-ATAC-seq3 library construction and sequencing
To create the sci-ATAC-seq3 libraries, we followed the protocol from (6). As previously described, frozen fixed nuclei were thawed, re-permeabilized in Omni lysis buffer (64), and diluted in ATAC-resuspension buffer (RSB) buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MaCl2) supplemented with 0.1% Tween-20. We profiled 11 samples corresponding to developmental time windows in 2 experiments and one barnyard sample per experiment. For each time window, 50,000 cells were deposited across wells of a LoBind 96-well plate, 8 time windows per experiment (across 11 wells) in addition to a barnyard sample (across 8 wells) which is a mixture of mouse CH12-LX and human GM12878 cell lines. Re-permeabilized nuclei were tagmented with Tn5 enzyme at 55°C for 30 min then stop reaction buffer (40 mM EDTA with 1 mM Spermidine) were added afterwards and incubated at 37°C for 15 min. Pooled tagmented nuclei from each time window were pooled, pelleted and washed. Phosphorylation master mix [1X polynucleotide kinase (PNK) buffer, 1 mM rATP, T4 PNK] was added to the washed tagmented nuclei and incubated at 37°C for 30 min. Next, ligation master mix [1X T7 ligase buffer, N5_splint, T7 DNA ligase enzyme] was added directly to the phosphorylation reaction followed by 384 distinct N5_oligos then incubated at 25°C for 1 hour. Stop reaction mixture was added to the ligation reaction and incubated at 37°C for 15 min. All wells were pooled then transferred into a 50-ml falcon tube, pelleted and washed with ATAC-RSB with 0.1% Tween-20. N7 ligation master mix [1X T7 ligase buffer, N7_splint, T7 DNA ligase] were added to the washed pellet and aliquoted into four 96-well LoBind plates. 384 distinct N7_oligos were added across four plates of N7 ligation and incubated at 25°C for 1 hour, then stop reaction mix were added for another 37°C for 15 min incubation. Afterwards, all wells were pooled and transferred to a clean 50-ml falcon tube then washed in ATAC-RSB with 0.1% Tween-20 before resuspending in Qiagen EB buffer. The ligated and washed nuclei were counted and aliquoted at 1000-3000 nuclei per well across four 96-well LoBind plates. Proteinase K and 1% SDS were added to the nuclei to reverse crosslink and incubated at 65°C for 16 hours. To determine the optimal cycle number, a test amplification was performed and monitored with SYBR green on a handful of wells of a reversed crosslink plate. The remaining plates were processed on the basis of the test PCR result. PCR amplifications were performed using NEBNext High Fidelity 2X PCR Master Mix, BSA [bovine serum albumin], indexed P5 oligo, and indexed P7 oligo. All wells were pooled and purified with Zymo Clean & Concentrate-5 and further purified with 1X AMPure bead to get rid of any remaining primers and adapter dimer. Purified libraries were quantified on an Agilent 4200 Tapestation System using D5000 reagents and screentape. Libraries were then diluted to 2 nM for sequencing using a custom recipe and primers on a NextSeq 500 to assess library complexity then further sequenced on an Illumina NovaSeq 6000 sequencer with custom sequencing recipe and primers.
Data processing for sci-ATAC-seq3 The sci-ATAC-seq3 raw reads were processed using the pipeline described in (6). Reads were mapped to the dm6 reference genome. The non-duplicate fragments are used for peak calling with MACS2 (65) in each sample, and then merged together with bedtools (66). We generate sparse matrices counting reads falling into each 5 kb window in the genome for cells passing a samplespecific threshold for each sample. We also generated sparse matrices counting reads falling into the merged peak set and into gene bodies plus 2kb upstream regions (proximal gene activity matrices). The barnyard sample was mapped to the merged hg19-mm9 reference genome to estimate the collision (two cells receiving the same barcode by chance) rate in the experiment.

Dimensionality reduction and clustering
The downstream analysis steps also closely follow the pipeline described in (6). We binarized the window-by-cell matrices for downstream analysis. We merge the binary matrices for time windows profiled in both experiments. We exclude peaks on sex chromosomes and peaks accessible in less than 1% of cells. We use latent semantic indexing with log-scaled term frequency to normalize the binary matrices. We use singular value decomposition on the normalized matrices to generate principal components (PCs). Retaining the 2nd through 50th PCs (discarding the 1st PC, which is generally correlated with read depth) and applying L2-normalization on the PCA matrix, we generate a low-dimensional representation for each time window. The normalized PCA matrices are used for Louvain clustering and UMAP (min.dist=0.3) as implemented in Seurat v3 (56). For the Louvain clustering, we used resolution of 0.3 for the first round clustering and we varied the resolution parameter per time window for the final clustering after cluster specific peak calling. To select the clustering resolution parameter for the scATAC-seq clustering in each time window, we computed clusters with various clustering resolutions and selected the one at which the proportion of variance explained by the clustering plateaued (elbow method). We did not observe batch effects between the two experiments ( Fig. S2C) in any developmental time window and thus did not run batch removal algorithms.

Doublet identification
The individual experiments had low estimated rates of doublets (Fig. S2). Based on the barnyard samples we expect 1.8% of doublets within our dataset (Fig. S2). We use a modified version of the scrublet algorithm (67) to calculate a per cell doublet score and set a threshold of the 95th percentile. For each developmental time window, cells with doublet scores above the threshold and clusters with over 25% cells of above the threshold are removed. The remaining cells for each developmental time window are re-embedded and re-clustered using the pipeline described above.

Cluster-specific peak calling
In order to generate a comprehensive set of peaks, we split the fragments files by cells in each Louvain cluster and call cluster-specific peaks with MACS2. The summit of each peak is extended to 150bp and then merged into a master peak set with bedtools. The peak-by-cell matrices are recounted with the new peak set, and the cells for each developmental time window are re-embedded and re-clustered using the pipeline described above. In total, we identified 110,185 peaks (median length = 217 bp) that collectively cover 22% of the dm6 genome. We compared this peak set with sets of known elements, including annotated TSS sites (extended 2 kb upstream and 200 bp downstream), peaks identified in (12) For each pair of sets of elements, we calculate the portion of elements overlapping elements in the other set by 1 bp overlap, and vice versa. The peaks in this study overlap over 85% of each set of the known elements, while the known elements overlap less than 50% of the peaks in this study.

Global embedding
In order to visualize global trends of development based on the regulatory landscape, we randomly sampled 20,000 cells from each 2-hour developmental time window and 40,000 cells from each 4-hour developmental time window, merged the chromatin accessibility profiles and embedded the cells together using the pipeline described above.

Cell type annotation
To transfer cell type labels from the sci-ATAC-seq dataset in (12), we use the integration pipeline implemented in Signac V1 (68) on five developmental time windows (2-4, 4-8, 6-10, 8-12, 10-14) that overlap one of the three time windows (2-4, 6-8, 10-12) profiled in (12). We collapse the cell type subsets (e.g. CNS A, CNS B to CNS) in (12) to a set of 37 cell type labels and count the number of cells in our dataset with each cell type label in each cluster. We assigned the most prominent cell type label to each cluster. For most clusters we could unambiguously assign a cell type label. We propagate the cell type labels to all developmental time windows through connections identified in the developmental tree described in the 'Reconstruction of the developmental tree' section. Since our study generated a much larger dataset, we expect to observe cell types that are not identified in (12). Therefore, we also refined the transferred labels based on differentially accessible (DA) peaks and differentially expressed (DE) genes. DA peaks are calculated from the binary peak matrix with the FindMarkers() function as implemented in Seurat V3 with test.use='LR' and logfc. We use a Fisher's exact test to look for 'terms' in CAD that are enriched in the foreground set compared to the background set. We used terms with adjusted p-value less than 0.05 to refine the transferred labels. DE genes are calculated from the proximal gene activity matrix with the FindMarkers() function as implemented in Seurat V3. We look for 'terms' in the BDGP that are enriched with the same procedure as in sci-RNA-seq annotation.

Motif analysis
We calculate a per-cell motif activity score for known motifs with PWMs in the CisBP database (69) using chromVAR (70). Within each cluster, chromVAR calculates a bias-corrected deviation of accessibility of each cell from the average accessibility in all the cells. We also use Homer (71) to identify motifs enriched in the mesoderm-specific gene regions. We take the peaks open in over 2% of mesoderm cells and that overlap the 1kb-10kb region upstream of the TSS of the four clusters of genes in Fig. 4C as the target, peaks open in less than 2% of mesoderm cells and overlapping the 1kb-10kb region upstream of the TSS of non-mesoderm genes as the background, and run findMotifsGenome with the parameters -cpg -size given (Table S7).

Inferring developmental time
To estimate the developmental time stamp for each single cell, we trained lasso linear and neural net-based models using the window-by-cell read counts matrix for sci-ATAC-seq and the expression read count matrix for sci-RNA-seq to predict the midpoint of each developmental time window (e.g., 7 for the 6-8 hours developmental time window). First, we equally subsampled cells for each time window to normalize the number of included cells per hour of collection time (i.e. 4 hour windows had twice as many nuclei as two hour windows). Following even subsampling, all genes and peaks that were constant values were removed. Then, we split the equally time subsampled data into 11 partitions of cells. The first 10 partitions were used for 10-fold cross validation to test many different model parameter choices (outlined below for each model type) and the final held out 11th partition was used as a test data set to evaluate the final models. To be clear, all model fitting described below was performed with the first 10 partitions of cells.
The lasso model was trained with cv.glmnet() as implemented in glmnet (72). In this case, the only parameter being fit by cv.glmnet() is the strength of the lasso penalty. We use the trained model to infer the developmental time for all the cells with predict(), setting s='lambda.min'.
We used tensorflow v 2.6.1 (73,74) to fit all neural net models using a fully connected, feedforward 6-layer (4 hidden) neural network. For the first hidden layer we included variable l1 or l2 regularization with the 'kernel_regularizer' parameter, and in the last layer we optionally constrained the output to within 0-20 with either a sigmoid or tanh activation function. All hidden layers used relu activation functions. The ATAC and RNA model were very similar, and only differed in the number of units per hidden layer. For ATAC there were 10, 100, 60, 20, and for RNA there were 5, 100, 50, 20, 1 units per hidden layer. Moreover, the RNA model input was scaled with a 'Normalization' layer from tensorflow, whereas the ATAC model was not. Finally, we used 'SparseTensor' to encode the ATAC data.
For fitting these models we optionally used either the standard mean squared error from the center hour of the collection window (MSE) or a custom loss function based on MSE except the error is set to 0 if nuclei are placed within the correct collection window. We used 10-fold cross validation to estimate the generalization error of these models with a variety of parameterization choices for the L1 penalty [values=1, 0.1, 0.001, 0.0001, 0.00001, 0.000001, 0], L2 penalty [values=1, 0.1, 0.001, 0.0001, 0.00001, 0.000001, 0], activation function (linear, sigmoid, tanh), and loss (MSE, custom). We selected the two best performing models (one using MSE and the other using the custom loss) based on the best median MSE and the proportion of nuclei placed into the correct collection window across all the 10-folds.
We split the developmental time frame of 0-20 hours into 10 non overlapping 2 hour inferred time windows and reassign cells to each inferred time window based on their inferred times. The cells for each inferred time window are re-embedded and re-clustered using the pipeline described above.
Scripts for arranging the data and fitting the models can be found on our supplementary data sharing website.
Initially, when we tested these time inference models with bulk RNA/ATAC/DNase-seq libraries, the age estimates were outside the hour range of this experiment even though they produced the correct temporal ordering. We suspected that these large inferred time values were due to the differences in the read depth of the bulk libraries versus single-cell nuclei. To address this we simulated subsampling reads the median number of reads per nuclei (377 UMI for RNA; 5294 unique reads for ATAC) with the 'rmultinom' function from base R with the parameter 'prob' set to the read counts of the bulk library. To integrate over sampling error, from each bulk library we repeated the subsampling process 100 times and then averaged over the model-inferred predictions. This process of adjusting the read distribution of the bulk libraries to match the singlecell nuclei resulted in time predictions that were in the same scale as our time course experiment.

Inferring nuclei sex
To infer whether each nuclei was XX or XY we used the proportion of chrX-mapped sci-ATAC-seq reads as a summary statistic and identified two distinct populations Fig. S11H. After filtering out nuclei with 0 chrX-mapped reads or if they were in the tails of the distribution (prop. X > 0.07 and prop. X < 0.22), we fit a gaussian mixture model with the 'normalmixEM' function from the mixtools package version 1.2.0 (75) and the parameter 'k'=2. The gaussian mixture model was successfully able to separate these two populations of cells into a set with likely XX genotype (more proportion of chrX-mapped reads) compared to a set of cells that were likely XY genotype (less proportion of chrX-mapped reads). To verify that these classifications were accurate, we could validate that the XY genotyped cells were indeed male cells by verifying that they were enriched for reads mapping to the Y chromosome. Cells with a >95% probability of being classified as XX, based on this mixture model fit, only 8% of these nuclei had 0.05% or more proportion of Y-mapped reads. In contrast, for confidently predicted male cells (i.e. cells with >95% probability of being classified as XY), 42% had 0.05% or more Y-mapped reads. The full annotations of nuclei genotype are now included in the supplementary tables. Unfortunately, this analysis was not possible with the sci-RNA-seq data likely due to dosage compensation and too few reads per nuclei.

Reconstruction of the developmental tree
To connect each cell state observed in a predicted time window with its most probable ancestor cell state from the previous predicted time window, we use the k-NN approach described in (76). We took cells from neighboring predicted time windows and co-embedded them using the pipeline described above. For cells in the later inferred time window, we identify ten k-NN cells in the previous inferred time window based on 50 PCs. The edge weights connecting the clusters in each inferred time window to clusters in the previous inferred time windows are set to the percent of cells in each cluster that has majority k-NNs from a cluster in the previous inferred time window. Edge weights < 0.2 are removed. Branches with only one leaf and nodes in the 18-20 hours inferred time window are pruned for the developmental tree shown in Fig. 3C-D.

Spatial analysis
A tranche of spatial data was recently released including a study on Drosophila using SpaTial Enhanced REsolution Omics-sequencing (Stereo-seq) (29), which was collected from late-stage embryos and all stages of larvae. The late-stage embryos were from corresponding windows as our embryo samples 14-16 h and 16-18 h after egg laying. This technology is not single-cell based, but instead patterned DNA nanoballs are placed on a slide that then capture RNA transcripts which are then sequenced and associated back to the slide spatial position of the nanoball slide spatial position. With an anchor-based integration FindTransferAnchors() and TransferData() from Seurat v3 (56), we performed probabilistic label transfer to assign our cluster labels to each patterned nanoball's spatial location. Using the assigned annotations of tissues from the original study as reference above, we observe a correspondence with our cluster annotations (Fig. S7E).
Connecting cell states in the sci-ATAC-seq dataset to those in the sci-RNA-seq dataset In order to identify matching cell states in the sci-ATAC-seq and sci-RNA-seq datasets, we implemented the non-negative least squares (NNLS) approach as described in (1,6). For the sci-RNA-seq dataset, we calculate an aggregate expression vector for each cluster in each time window by summing the log-transformed normalized UMI counts of all cells in that cluster. For the sci-ATAC-seq dataset we take the proximal gene activity matrix and calculate a similar aggregate activity vector for each cluster. Then we apply non-negative least squares (NNLS) regression to predict gene expression in a target cluster in the sci-RNA-seq dataset based on the gene activity of all clusters in the corresponding time window in the sci-ATAC-seq dataset. The resulting β matrix is denoted !" ∈ # × & where i and j are the number of RNA and ATAC clusters respectively. We then repeat the analysis predicting gene activity with gene expression to obtain "! ∈ & × # . For each pair of clusters from the two datasets, we calculate a final beta value based on the element-wise multiplication of the two beta value matrices from NNLS: β = 2( !" + 0.001)( "! ' + 0.001). Similar clusters, based on patterns of gene expression and the sum of ATACseq reads around a gene, in the two datasets have higher beta values.

Determining the relationship between TF expression and associated motif accessibility for NNLSlinked clusters
Relying on the NNLS-based links between clusters of ATAC and RNA data, we next set out to determine correlation between TF-associated motif accessibility and TF-associated gene expression. Presumably TFs with strong correlations are active, with positive correlations indicating TFs likely to be activators of gene expression and negative correlations indicating TFs likely to be repressors of gene expression. For each cluster we averaged the expression of each TF across all cells and averaged the associated motif accessibility score (as described previously) across all cells. Next for each RNA cluster we chose the best associated ATAC cluster as determined through NNLS analysis. For comparison we also performed the vice versa association pairing each ATAC cluster with the top associated RNA cluster and the inferred correlations were similar (Fig. S9G). We then computed the spearman correlation and Pearson's R correlation between motif accessibility score and gene expression for each TF with a known associated motif. Additionally, we computed these correlations for each gene at each two hour inferred time window. For the analysis visualized in Fig. 5D we fit a linear regression model with 'lm()' in R that predicts the TF motif-associated ATAC-seq activity (from chromVAR) from an interaction term including the expression of the related TF, the germ layer of the cluster, and time window. Prior to fitting this model, the TF expression values were scaled and normalized to have 0 mean and an sd of 1 with 'scale()'. Additionally, we weighed the model observations by the NNLS correlation value representing the strength of the link between the ATAC and the RNA cluster. Intuitively, weighting the observations will increase the contribution from clusters that are strongly linked compared to clusters that have a weak link between ATAC and RNA.

Gene regulatory network analysis
To integrate our transcriptome and chromatin accessibility data, we first created subsampled Seurat objects (5,000 cells) for cells at 10-12 hr and log normalized the RNA counts and ATAC gene activities. We identified variable features for each object using Seurat 'SelectIntegrationFeatures' function and used the features to perform Canonical Correlation Analysis (CCA) with Seurat RunCCA function, in order to create a common embedding for our independently-assayed ATAC and RNA modalities. In CCA space, we used FigR 'pairCells' function to identify pairs of ATAC-RNA cells by geodesic distance-based pairing (38). Gene expression and peak accessibility counts from paired cells were used as input for FigR 'runGenePeakcorr' in order to identify significant (p-value < 0.05) peak-gene associations. The 'runGenePeakcorr' function was slightly modified to accept the Drosophila melanogaster dm6 genome (Bioconductor library BSgenome.Dmelanogaster.UCSC.dm6) as a valid input. At this point, the DORC scores and the original RNA counts were smoothed with FigR function 'smoothScoresNN', and fed together with the original ATAC peak counts and the significant peakgene associations into FigR function 'runFigRGRN', for inference of the gene regulatory networks. The 'runFigRGRN' was slightly modified to accept Drosophila melanogaster CisBP PWMs as a valid input. For generating the plots presented in Figure S11, we ranked TFs by average regulation score using the FigR function 'rankDrivers', while TFs regulating individual genes were retrieved with the function 'plotDrivers'. Domains of regulatory chromatin (DORCs) were identified with FigR function 'dorcJPlot' with 'cutoff' = 10 (at least 10 significantly linked peaks to be called a DORC) (Fig. S11C).

Supplementary Text Supplementary Note 1: Diversification of non-myogenic mesoderm
We examined the diversification of the non-myogenic mesoderm trajectories in more detail. The non-myogenic mesoderm includes the fat body and haemocyte lineages, which in the embryo includes plasmatocytes and crystal cells. Previous genetic studies demonstrated that the fat body develops from the trunk mesoderm (77), similar to the somatic muscle, while the embryonic haemocytes originate from the head mesoderm (78). In agreement with this, and without any prior knowledge, the scATAC-derived graphs suggest that the fat body shares a developmental origin with somatic muscle, whereas the haemocytes originate from a separate mesodermal trajectory.
To explore these trajectories at finer resolution, we isolated and re-clustered all cells annotated as plasmatocytes, fat body and crystal cells, using scRNA data from 6-18 hrs. This revealed 17 subclusters, with a large group of early cells (right, clusters 1, 4, 13, 6, 7), that diversifies at 8-12 hrs. The early clusters are defined by the expression of several transcription factors (TFs) including Ultrabithorax (Ubx), restricted to the earlier stages, and homothorax (hth) and Zn finger homeodomain 1 (zfh1), expressed from early to late stages in development. All branches express common haemocyte and fat body marker genes such as the collagenase Col4a1, the procollagen lysyl hydroxylase Plod and the extracellular matrix gene Tig. These branches correspond to sub-trajectories for plasmatocytes (clusters 8,9,12,14), fat body (clusters 0, 3, 10, 11, 16), and additional two muscle clusters (cluster 2, 5). Finally, the crystal cells (a relatively rare cell type, expressing the marker genes Lozenge, PPO1, PPO2) are clearly separated and disconnected from the rest (cluster 15), suggesting that they may have a different developmental origin or, alternatively, that they deviate from a common precursor population earlier than 6 hrs.
Plasmatocytes represent ~90% of the embryonic haemocytes, and are phagocytic cells involved in clearing apoptotic cells (79), which requires the expression of the scavenger receptors crq and drpr (80, 81). We observe both genes, in addition to other scavenger receptors (e.g. Nimrod C4 and NimB4, specific to later embryonic stages), expressed along the plasmatocyte subtrajectory. The second function of plasmatocytes is the secretion of extracellular matrix (ECM) proteins, which is also evident from our single cell trajectory revealing dynamic expression of a large repertoire of ECM proteins, such as Papilin (Ppn), Peroxidasin (Pxn), Glutactin (Glt), Tiggrin (Tig), basement membrane-associated SPARC, laminin A (lanA), and two collagenase IV molecules -Col4a1 and Viking (vkg). The second major branch is the fat body, which is thought to have similar roles to the mammalian liver and adipose tissues, in addition to having an essential role in immune responses, as the main source of antimicrobial peptides (AMPs) (82). We identified 289 genes significantly associated with specific subclusters that highlight the dynamic differentiation of the fat body. For example, the expression of genes involved in lipid transport (e.g. apolpp and apoltp (83,84)), increases with the maturation of the fat body cells.
Supplementary Note 2: Tracing dynamic gene modules across development The analysis steps associated with mesoderm gene clustering of Fig. 4 were motivated to identify the TFs driving mesoderm-specific expression variability and determine whether we were capable of ordering putative regulatory events (e.g. Kah/Mhc). Of course, there are different directions to approach this or related questions. Therefore, we performed a complementary analysis by clustering accessible regions variable in the mesoderm tissue (Fig. S8D). The clustering approach was the same as in Fig. 4 but using variable chromatin accessibility regions from the set of mesoderm annotated nuclei. This analysis identified a similar set of 4 clusters of accessibility elements that are likely involved with regulatory patterning linked to mesoderm differentiation. Additionally, we performed a motif enrichment analysis on these peaks and found 483 significant associations between a peak group and TF (q-value < 1 x 10 -3 and a match present in at least 1% of target peaks) from 192 unique TFs. Tying this back to the mesoderm expression analysis, we found that 40 of these TFs were in the set of genes that were variably expressed in the mesoderm tissue. These significant TF associations and the clustered peaks are available from our supplementary data website and Table S12-S13.

Supplementary Note 3: Clarification of selected in situ genes
The five genes that we selected for follow-up in situ validation were based on results from our initial nuclei age inference model that used lasso regression to order cells. However, the two genes indicated with an asterisk in Fig. 4E (CG18766, CG8034) were missing from the mesoderm gene clustering analysis based on an updated nuclei age model that used deep learning as they were not present in the initial stringent mesoderm subset based on the top 5k variable genes. Upon manual inclusion of these two missing genes into the set of 5k variable genes (so now 5,002 genes), the genes were highly variable following normalization and averaging across time windows (they were included in the second set of 2k variable genes) and were grouped into the correct temporal cluster. Therefore, we have included these genes in Fig. 4E grouped by the original mesoderm clustering (based on the initial lasso-based age inference model).

Supplementary Note 4: Nomination of fruitless as a repressive regulator in neuroectoderm
In the neuroectoderm, Fruitless (Fru) is associated with decreasing chromatin accessibility; i.e. increasing fru expression led to chromatin regions containing the Fru motif to become less accessible (Fig. 5D, bottom). Intriguingly, Fruitless is associated with specifying the molecular determinants of male courtship behavior by 'masculinizing' specific neurons (85)(86)(87). However, only recently have studies shown that Fru likely acts through the repression of chromatin accessibility at specific regulatory elements (88)(89)(90).